⚡️ Speed up function get_prompt by 29%
#147
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 29% (0.29x) speedup for
get_promptinmlflow/genai/judges/prompts/correctness.py⏱️ Runtime :
659 microseconds→510 microseconds(best of82runs)📝 Explanation and details
The optimized code achieves a 29% speedup by replacing repeated regex compilation with cached precompiled patterns in the
format_promptfunction.Key optimization: The original code compiled a regex pattern from scratch for every template variable replacement (
re.sub(r"\{\{\s*" + key + r"\s*\}\}", replacement, prompt)). The optimized version introduces_compiled_key_pattern()with@lru_cache(maxsize=64)that compiles and caches regex patterns for each unique key.Why this works: Regex compilation is computationally expensive. The line profiler shows that the regex substitution line consumed 89.7% of the total time in the original
format_promptfunction. By caching compiled patterns, repeated calls with the same template keys (like "input", "output", "ground_truth") reuse precompiled patterns instead of recompiling them each time.Performance impact: The optimization is particularly effective when:
re.escape()preprocessingTest case benefits: All test cases show consistent 15-40% improvements, with the largest gains in basic scenarios (30-40%) and slightly smaller but still significant gains in large-scale tests (14-30%) where the overhead of string manipulation becomes more dominant relative to regex compilation.
The optimization maintains identical behavior while eliminating redundant regex compilation overhead, making it especially valuable for prompt formatting workloads where templates are reused frequently.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
import re
from typing import Any
imports
import pytest
from mlflow.genai.judges.prompts.correctness import get_prompt
CORRECTNESS_PROMPT_INSTRUCTIONS = """
Consider the following question, claim and document. You must determine whether the claim is
supported by the document in the context of the question. Do not focus on the correctness or
completeness of the claim. Do not make assumptions, approximations, or bring in external knowledge.
{{input}}
{{ground_truth}}
{{input}} - {{output}}</document
"""
CORRECTNESS_PROMPT_OUTPUT = """
Please indicate whether each statement in the claim is supported by the document in the context of the question using only the following json format. Do not use any markdown formatting or output additional lines.
{
"rationale": "Reason for the assessment. If the claim is not fully supported by the document in the context of the question, state which parts are not supported. Start each rationale with
Let's think step by step","result": "yes|no"
}
""" # noqa: E501
CORRECTNESS_PROMPT = CORRECTNESS_PROMPT_INSTRUCTIONS + CORRECTNESS_PROMPT_OUTPUT
CORRECTNESS_PROMPT_SUFFIX = """
If the claim is fully supported by the document in the context of the question, you must say "The response is correct" in the rationale. If the claim is not fully supported by the document in the context of the question, you must say "The response is not correct".""" # noqa: E501
from mlflow.genai.judges.prompts.correctness import get_prompt
unit tests
----------- Basic Test Cases -----------
def test_basic_with_expected_response():
# Basic scenario: all fields provided
request = "What is the capital of France?"
response = "The capital of France is Paris."
expected_response = "Paris"
codeflash_output = get_prompt(request, response, expected_response); prompt = codeflash_output # 11.4μs -> 8.10μs (41.1% faster)
def test_basic_with_expected_facts():
# Basic scenario: expected_facts provided, expected_response omitted
request = "List three colors."
response = "Red, Green, Blue."
expected_facts = ["Red", "Green", "Blue"]
codeflash_output = get_prompt(request, response, expected_response=None, expected_facts=expected_facts); prompt = codeflash_output # 13.1μs -> 9.94μs (31.5% faster)
# Check that each fact is present in the tag, formatted as bullet points
for fact in expected_facts:
pass
def test_basic_with_both_expected_response_and_facts():
# When both are provided, expected_response takes precedence
request = "Name a mammal."
response = "Elephant."
expected_response = "Elephant"
expected_facts = ["Elephant", "Whale"]
codeflash_output = get_prompt(request, response, expected_response=expected_response, expected_facts=expected_facts); prompt = codeflash_output # 11.4μs -> 8.82μs (28.9% faster)
def test_basic_with_no_expected_response_or_facts():
# Both expected_response and expected_facts omitted
request = "What is 2+2?"
response = "4"
codeflash_output = get_prompt(request, response); prompt = codeflash_output # 11.4μs -> 8.36μs (36.8% faster)
----------- Edge Test Cases -----------
def test_edge_empty_strings():
# All fields are empty strings
codeflash_output = get_prompt("", "", expected_response="", expected_facts=[]); prompt = codeflash_output # 11.4μs -> 8.69μs (31.0% faster)
def test_edge_none_expected_response_and_facts():
# expected_response and expected_facts both None
codeflash_output = get_prompt("Q", "A", expected_response=None, expected_facts=None); prompt = codeflash_output # 11.4μs -> 8.67μs (31.9% faster)
def test_edge_empty_expected_facts_list():
# expected_facts is empty list, expected_response is None
codeflash_output = get_prompt("Q", "A", expected_response=None, expected_facts=[]); prompt = codeflash_output # 11.4μs -> 8.48μs (34.5% faster)
def test_edge_expected_facts_with_empty_string():
# expected_facts contains empty string
codeflash_output = get_prompt("Q", "A", expected_response=None, expected_facts=[""]); prompt = codeflash_output # 12.2μs -> 9.50μs (27.9% faster)
def test_edge_special_characters_in_inputs():
# Inputs contain special characters and backslashes
request = "What is the path to C:\Windows?"
response = "C:\Windows is the default."
expected_response = "C:\Windows"
codeflash_output = get_prompt(request, response, expected_response); prompt = codeflash_output # 18.8μs -> 15.9μs (18.3% faster)
def test_edge_long_expected_response():
# Very long expected_response string
long_text = "A" * 500
codeflash_output = get_prompt("Q", "A", expected_response=long_text); prompt = codeflash_output # 11.8μs -> 8.81μs (34.2% faster)
def test_edge_multiline_expected_response():
# expected_response is multiline
expected_response = "Line1\nLine2\nLine3"
codeflash_output = get_prompt("Q", "A", expected_response=expected_response); prompt = codeflash_output # 11.3μs -> 8.51μs (32.9% faster)
def test_edge_multiline_expected_facts():
# expected_facts contains multiline strings
expected_facts = ["Fact1\nSubfactA", "Fact2\nSubfactB"]
codeflash_output = get_prompt("Q", "A", expected_response=None, expected_facts=expected_facts); prompt = codeflash_output # 12.1μs -> 9.68μs (24.7% faster)
for fact in expected_facts:
pass
def test_edge_expected_facts_with_special_chars():
# expected_facts contains special characters
expected_facts = ["", "&symbol", "100%"]
codeflash_output = get_prompt("Q", "A", expected_response=None, expected_facts=expected_facts); prompt = codeflash_output # 12.6μs -> 9.74μs (29.1% faster)
for fact in expected_facts:
pass
def test_edge_expected_facts_and_empty_expected_response():
# expected_facts provided, expected_response is empty string
expected_facts = ["Alpha", "Beta"]
codeflash_output = get_prompt("Q", "A", expected_response="", expected_facts=expected_facts); prompt = codeflash_output # 12.8μs -> 9.85μs (29.6% faster)
# Should use expected_facts for
for fact in expected_facts:
pass
----------- Large Scale Test Cases -----------
def test_large_expected_facts_list():
# Large number of expected facts
expected_facts = [f"Fact{i}" for i in range(500)]
codeflash_output = get_prompt("Q", "A", expected_response=None, expected_facts=expected_facts); prompt = codeflash_output # 19.4μs -> 16.3μs (19.3% faster)
# All facts should be present as bullet points
for fact in expected_facts:
pass
def test_large_expected_response():
# Large expected_response string
expected_response = " ".join([f"Word{i}" for i in range(500)])
codeflash_output = get_prompt("Q", "A", expected_response=expected_response); prompt = codeflash_output # 12.9μs -> 9.81μs (31.7% faster)
def test_large_request_and_response():
# Large request and response strings
request = "Q" * 500
response = "A" * 500
codeflash_output = get_prompt(request, response, expected_response="Result"); prompt = codeflash_output # 13.2μs -> 10.2μs (28.7% faster)
def test_large_expected_facts_and_expected_response():
# Large expected_facts, but expected_response is provided (should take precedence)
expected_facts = [f"Fact{i}" for i in range(500)]
expected_response = "MainResult"
codeflash_output = get_prompt("Q", "A", expected_response=expected_response, expected_facts=expected_facts); prompt = codeflash_output # 11.5μs -> 8.64μs (33.5% faster)
def test_large_all_fields_empty():
# Large empty lists/strings
codeflash_output = get_prompt("", "", expected_response="", expected_facts=[]); prompt = codeflash_output # 12.1μs -> 8.85μs (37.2% faster)
----------- Determinism and Consistency -----------
def test_deterministic_output():
# Same inputs should always produce the same output
request = "Q"
response = "A"
expected_response = "R"
codeflash_output = get_prompt(request, response, expected_response); prompt1 = codeflash_output # 11.5μs -> 8.30μs (38.4% faster)
codeflash_output = get_prompt(request, response, expected_response); prompt2 = codeflash_output # 4.37μs -> 3.15μs (38.8% faster)
def test_order_of_expected_facts_preserved():
# Order of expected_facts should be preserved in output
expected_facts = ["First", "Second", "Third"]
codeflash_output = get_prompt("Q", "A", expected_response=None, expected_facts=expected_facts); prompt = codeflash_output # 12.6μs -> 9.53μs (31.8% faster)
idx_first = prompt.index("- First")
idx_second = prompt.index("- Second")
idx_third = prompt.index("- Third")
----------- Mutation Testing Guards -----------
def test_mutation_ground_truth_priority():
# If both expected_response and expected_facts are provided, expected_response must be used
expected_facts = ["ShouldNotAppear"]
expected_response = "ShouldAppear"
codeflash_output = get_prompt("Q", "A", expected_response=expected_response, expected_facts=expected_facts); prompt = codeflash_output # 11.7μs -> 8.51μs (37.2% faster)
def test_mutation_suffix_only_with_expected_facts():
# Suffix is only present when expected_facts is provided and expected_response is None or empty
codeflash_output = get_prompt("Q", "A", expected_response="NonEmpty", expected_facts=["Fact"]); prompt1 = codeflash_output # 11.7μs -> 8.74μs (34.0% faster)
codeflash_output = get_prompt("Q", "A", expected_response=None, expected_facts=["Fact"]); prompt2 = codeflash_output # 5.48μs -> 4.60μs (19.2% faster)
def test_mutation_empty_expected_facts_no_suffix():
# Suffix should not be present if expected_facts is empty
codeflash_output = get_prompt("Q", "A", expected_response=None, expected_facts=[]); prompt = codeflash_output # 11.6μs -> 8.49μs (36.8% faster)
def test_mutation_expected_facts_none_no_suffix():
# Suffix should not be present if expected_facts is None
codeflash_output = get_prompt("Q", "A", expected_response=None, expected_facts=None); prompt = codeflash_output # 11.6μs -> 8.61μs (34.8% faster)
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import re
from typing import Any
imports
import pytest
from mlflow.genai.judges.prompts.correctness import get_prompt
CORRECTNESS_PROMPT_INSTRUCTIONS = """
Consider the following question, claim and document. You must determine whether the claim is
supported by the document in the context of the question. Do not focus on the correctness or
completeness of the claim. Do not make assumptions, approximations, or bring in external knowledge.
{{input}}
{{ground_truth}}
{{input}} - {{output}}
"""
CORRECTNESS_PROMPT_OUTPUT = """
Please indicate whether each statement in the claim is supported by the document in the context of the question using only the following json format. Do not use any markdown formatting or output additional lines.
{
"rationale": "Reason for the assessment. If the claim is not fully supported by the document in the context of the question, state which parts are not supported. Start each rationale with
Let's think step by step","result": "yes|no"
}
""" # noqa: E501
CORRECTNESS_PROMPT = CORRECTNESS_PROMPT_INSTRUCTIONS + CORRECTNESS_PROMPT_OUTPUT
CORRECTNESS_PROMPT_SUFFIX = """
If the claim is fully supported by the document in the context of the question, you must say "The response is correct" in the rationale. If the claim is not fully supported by the document in the context of the question, you must say "The response is not correct".""" # noqa: E501
from mlflow.genai.judges.prompts.correctness import get_prompt
unit tests
----------- BASIC TEST CASES -----------
def test_basic_expected_response_only():
"""Basic: Only expected_response is provided, no expected_facts."""
codeflash_output = get_prompt("What is the capital of France?", "Paris is the capital.", expected_response="Paris"); result = codeflash_output # 12.2μs -> 9.05μs (34.3% faster)
def test_basic_expected_facts_only():
"""Basic: Only expected_facts is provided, expected_response is None."""
facts = ["Paris is the capital of France", "France is in Europe"]
codeflash_output = get_prompt("What is the capital of France?", "Paris is the capital.", expected_facts=facts); result = codeflash_output # 12.8μs -> 10.4μs (23.8% faster)
# The ground_truth must be a bullet list of facts, and suffix should be present
for fact in facts:
pass
def test_basic_both_expected_response_and_facts():
"""Basic: Both expected_response and expected_facts provided. Only expected_response is used."""
facts = ["Paris is the capital of France", "France is in Europe"]
codeflash_output = get_prompt("What is the capital of France?", "Paris is the capital.", expected_response="Paris", expected_facts=facts); result = codeflash_output # 11.5μs -> 8.89μs (29.8% faster)
for fact in facts:
pass
def test_basic_no_expected_response_or_facts():
"""Basic: Neither expected_response nor expected_facts provided."""
codeflash_output = get_prompt("What is the capital of France?", "Paris is the capital."); result = codeflash_output # 11.5μs -> 8.50μs (34.8% faster)
----------- EDGE TEST CASES -----------
def test_edge_empty_strings_everywhere():
"""Edge: All input strings are empty."""
codeflash_output = get_prompt("", "", expected_response="", expected_facts=[]); result = codeflash_output # 11.6μs -> 8.49μs (37.0% faster)
def test_edge_expected_facts_is_empty_list():
"""Edge: expected_facts is an empty list, expected_response is None."""
codeflash_output = get_prompt("Q", "A", expected_facts=[]); result = codeflash_output # 11.7μs -> 8.66μs (35.2% faster)
def test_edge_expected_facts_is_none():
"""Edge: expected_facts is None, expected_response is None."""
codeflash_output = get_prompt("Q", "A"); result = codeflash_output # 11.1μs -> 8.31μs (34.0% faster)
def test_edge_expected_response_is_none_but_facts_present():
"""Edge: expected_response is None, expected_facts is non-empty."""
facts = ["Fact 1", "Fact 2"]
codeflash_output = get_prompt("Q", "A", expected_response=None, expected_facts=facts); result = codeflash_output # 12.2μs -> 9.45μs (28.9% faster)
# The claim should be a bullet list, suffix present
for fact in facts:
pass
def test_edge_expected_response_is_empty_string_and_facts_present():
"""Edge: expected_response is '', expected_facts is non-empty."""
facts = ["Fact 1", "Fact 2"]
codeflash_output = get_prompt("Q", "A", expected_response="", expected_facts=facts); result = codeflash_output # 12.8μs -> 9.78μs (31.2% faster)
# The claim should be a bullet list, suffix present
for fact in facts:
pass
def test_edge_expected_response_is_nonempty_and_facts_empty():
"""Edge: expected_response is non-empty, expected_facts is empty list."""
codeflash_output = get_prompt("Q", "A", expected_response="Something", expected_facts=[]); result = codeflash_output # 11.6μs -> 8.69μs (33.5% faster)
def test_edge_expected_response_is_none_and_facts_none():
"""Edge: Both expected_response and expected_facts are None."""
codeflash_output = get_prompt("Q", "A", expected_response=None, expected_facts=None); result = codeflash_output # 11.5μs -> 8.86μs (30.1% faster)
def test_edge_special_characters_in_input():
"""Edge: Special characters in input, response, and expected_response."""
request = "What is 2 + 2?\nExplain with symbols: <>&"'\"
response = "2 + 2 = 4. Symbols: <>&"'\"
expected_response = "4"
codeflash_output = get_prompt(request, response, expected_response=expected_response); result = codeflash_output # 18.0μs -> 15.3μs (17.4% faster)
def test_edge_backslashes_in_expected_response_and_facts():
"""Edge: Backslashes in expected_response and expected_facts."""
expected_response = "C:\Users\Test"
expected_facts = ["Path is C:\Users\Test", "Another \ fact"]
codeflash_output = get_prompt("Q", "A", expected_response=expected_response); result1 = codeflash_output # 16.0μs -> 13.8μs (16.1% faster)
codeflash_output = get_prompt("Q", "A", expected_facts=expected_facts); result2 = codeflash_output # 8.22μs -> 6.91μs (19.1% faster)
def test_edge_long_expected_response_and_fact():
"""Edge: Very long expected_response and fact."""
long_string = "a" * 500
facts = [long_string, long_string]
codeflash_output = get_prompt("Q", "A", expected_response=long_string, expected_facts=facts); result = codeflash_output # 11.8μs -> 9.03μs (30.9% faster)
for fact in facts:
pass
def test_edge_expected_facts_with_empty_string_items():
"""Edge: expected_facts contains empty strings."""
facts = ["", "Fact 2"]
codeflash_output = get_prompt("Q", "A", expected_facts=facts); result = codeflash_output # 12.5μs -> 9.89μs (26.1% faster)
def test_edge_expected_facts_with_whitespace_items():
"""Edge: expected_facts contains whitespace-only strings."""
facts = [" ", "\t", "\n"]
codeflash_output = get_prompt("Q", "A", expected_facts=facts); result = codeflash_output # 12.6μs -> 9.95μs (27.0% faster)
----------- LARGE SCALE TEST CASES -----------
def test_large_scale_many_facts():
"""Large scale: expected_facts contains 999 items."""
facts = [f"Fact {i}" for i in range(999)]
codeflash_output = get_prompt("Q", "A", expected_facts=facts); result = codeflash_output # 25.7μs -> 22.4μs (14.3% faster)
# All facts should appear in the result, suffix present
for i in range(999):
pass
def test_large_scale_long_strings():
"""Large scale: Very long strings for request, response, expected_response."""
long_request = "Q" * 500
long_response = "A" * 500
long_expected_response = "E" * 500
codeflash_output = get_prompt(long_request, long_response, expected_response=long_expected_response); result = codeflash_output # 13.7μs -> 10.6μs (30.2% faster)
def test_large_scale_facts_and_expected_response():
"""Large scale: Both expected_response and expected_facts are large, only expected_response used."""
facts = [f"Fact {i}" for i in range(999)]
expected_response = "Expected" * 100
codeflash_output = get_prompt("Q", "A", expected_response=expected_response, expected_facts=facts); result = codeflash_output # 11.7μs -> 9.02μs (30.0% faster)
for i in range(999):
pass
def test_large_scale_empty_facts_and_long_expected_response():
"""Large scale: expected_facts empty, expected_response long string."""
expected_response = "X" * 999
codeflash_output = get_prompt("Q", "A", expected_response=expected_response, expected_facts=[]); result = codeflash_output # 12.1μs -> 9.41μs (29.0% faster)
def test_large_scale_all_empty():
"""Large scale: All inputs empty or empty lists."""
codeflash_output = get_prompt("", "", expected_response="", expected_facts=[]); result = codeflash_output # 11.5μs -> 8.44μs (36.0% faster)
----------- FUNCTIONALITY & MUTATION TEST CASES -----------
def test_mutation_expected_response_priority_over_facts():
"""Mutation: If both expected_response and expected_facts are provided, only expected_response is used."""
facts = ["Fact 1", "Fact 2"]
codeflash_output = get_prompt("Q", "A", expected_response="ER", expected_facts=facts); result = codeflash_output # 11.8μs -> 8.60μs (37.1% faster)
for fact in facts:
pass
def test_mutation_suffix_only_with_facts_and_no_expected_response():
"""Mutation: Suffix only appears when facts are provided and expected_response is None or empty."""
facts = ["Fact 1"]
codeflash_output = get_prompt("Q", "A", expected_facts=facts); result1 = codeflash_output # 12.3μs -> 9.40μs (30.6% faster)
codeflash_output = get_prompt("Q", "A", expected_response="ER", expected_facts=facts); result2 = codeflash_output # 5.25μs -> 3.84μs (36.6% faster)
def test_mutation_empty_expected_response_and_facts():
"""Mutation: If both expected_response and expected_facts are empty/None, claim must be empty, no suffix."""
codeflash_output = get_prompt("Q", "A", expected_response="", expected_facts=[]); result = codeflash_output # 11.3μs -> 8.66μs (30.7% faster)
def test_mutation_expected_facts_none_and_expected_response_empty():
"""Mutation: If expected_facts is None and expected_response is '', claim must be empty, no suffix."""
codeflash_output = get_prompt("Q", "A", expected_response="", expected_facts=None); result = codeflash_output # 11.4μs -> 8.75μs (30.1% faster)
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from mlflow.genai.judges.prompts.correctness import get_prompt
def test_get_prompt():
get_prompt('', '', expected_response=None, expected_facts=[''])
def test_get_prompt_2():
get_prompt('', '', expected_response='', expected_facts=[])
To edit these changes
git checkout codeflash/optimize-get_prompt-mhuss9lkand push.